タスク特化型AIから汎用的大規模言語モデルへ

人工知能のパラダイム転換

1. 特定から汎用へ

AI分野では、モデルの学習と展開方法において大きな変革が起こりました。

旧パラダイム（タスク特化型学習）：初期のCNNやBERTのようなモデルは、特定の目的（例：感情分析のみ）に特化して学習されていました。翻訳や要約などには別のモデルが必要でした。
新パラダイム（集約的事前学習＋プロンプト入力）：一つの巨大なモデル（大規模言語モデル）がインターネット規模のデータセットから一般的な世界知識を学びます。その後、入力するプロンプトを変えるだけで、ほぼあらゆる言語処理タスクを実行できるようになります。

2. 構造の進化

エンコーダー専用（BERT時代）：理解と分類に特化しています。これらのモデルはテキストを双方向で読み取ることで深い文脈を捉えますが、新しいテキストの生成を目的として設計されていません。
デコーダー専用（GPT／Llama時代）：生成型AIの現代的な標準です。これらのモデルは自己回帰モデルを使って次の単語を予測し、自由な生成や会話に最適です。

3. 変化の主な要因

自己教師あり学習：膨大な量のラベルなしインターネットデータを用いた学習により、人間によるラベリングというボトルネックが解消されました。
スケーリング則：モデルのサイズ（パラメータ数）、データ量、計算パワーとAIの性能が予測可能に比例するという経験的事実。

重要な洞察

AIは「タスク特化型ツール」から、「推論やコンテキスト内学習といった発現能力を持つ汎用エージェント」へと進化しました。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.